Evaluation of the invasiveness of pure ground-glass nodules based on dual-head ResNet technique

Objective To intelligently evaluate the invasiveness of pure ground-glass nodules with multiple classifications using deep learning. Methods pGGNs in 1136 patients were pathologically confirmed as lung precursor lesions [atypical adenomatous hyperplasia (AAH) and adenocarcinoma in situ (AIS)], minimally invasive adenocarcinoma (MIA), or invasive adenocarcinoma (IAC). Four different models [EfficientNet-b0 2D, dual-head ResNet_3D, a 3D model combining three features (3D_3F), and a 3D model combining 19 features (3D_19F)] were constructed to evaluate the invasiveness of pGGNs using the EfficientNet and ResNet networks. The Obuchowski index was used to evaluate the differences in diagnostic efficiency among the four models. Results The patients with pGGNs (360 men, 776 women; mean age, 54.63 ± 12.36 years) included 235 cases of AAH + AIS, 332 cases of MIA, and 569 cases of IAC. In the validation group, the areas under the curve in detecting the invasiveness of pGGNs as a three-category classification (AAH + AIS, MIA, IAC) were 0.8008, 0.8090, 0.8165, and 0.8158 for EfficientNet-b0 2D, dual-head ResNet_3D, 3D_3F, and 3D_19F, respectively, whereas the accuracies were 0.6422, 0.6158, 0.651, and 0.6364, respectively. The Obuchowski index revealed no significant differences in the diagnostic performance of the four models. Conclusions The dual-head ResNet_3D_3F model had the highest diagnostic efficiency for evaluating the invasiveness of pGGNs in the four models. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-024-12823-4.


Introduction
Lung cancer is the leading cause of cancer-related mortality globally, and the most common subtype is lung adenocarcinoma [1,2].The new classification of lung tumors issued by WHO in 2021 classified lung epithelial tumors as lung precursor lesions [atypical adenomatous hyperplasia (AAH) and adenocarcinoma in situ (AIS)], minimally invasive adenocarcinoma (MIA), or invasive adenocarcinoma (IAC) [3].Studies have confirmed that persistent pure ground-glass nodules (pGGNs) can be pathologically diagnosed as pre-invasive carcinoma or IAC [4,5], and 30-40% of patients with resected pGGNs are IAC [6,7].Therefore, the imaging features of different histopathological subtypes manifesting as pGGNs on computed tomography (CT) overlap significantly.
It is generally known that lung precursor lesions can be followed up without urgent surgical resection.In addition, MIA can be followed up without surgery or treated by partial resection via lung lobectomy.Meanwhile, IAC should be treated with an appropriately extended scope of resection or complete lobectomy if necessary, and lymph node dissection is required [8][9][10].AIS and MIA have a 5-year disease-free survival rate of 100% [11,12], whereas that for IAC ranges from 33.3 to 99.0% depending on the major histological subtype [13].Therefore, it is extremely important to accurately distinguish lung precursor lesions, MIA, and IAC in the new classification to select the best treatment to optimize patient prognosis.
Thus far, studies on the infiltration of pGGNs based on morphological changes on CT have revealed that a larger diameter and mass, higher attenuation value, irregular shape, lobulation, spiculation, cavitation, and internal vascular changes are closely related to the degree of infiltration [5,7,[14][15][16][17][18][19].However, it remains difficult to clarify the invasiveness of pGGNs by CT.The possible reasons are as follows: (1) measurements of size and assessments of signs are susceptible to multiple factors (e.g., reproduction of measurement, scanning parameter, reconstruction algorithm, observer experience); (2) some signs have a low incidence in pGGNs and the ability to identify lung precursor lesions is limited; and (3) most features in pre-invasive adenocarcinoma and IAC overlap.
In recent years, increasing numbers of scholars have focused on the application of radiomics for clarifying the invasiveness of pGGNs.Fan et al. [20].reported that radiomics had good predictive performance in differentiating IAC and non-invasive adenocarcinoma from ground-glass nodules (GGNs).In the external validation group, the area under the curve (AUC) and accuracy (ACC) were 0.936 and 0.881, respectively, suggesting that radiomics could be used as a non-invasive modality for determining follow-up and treatment strategies for lung adenocarcinoma.Other studies demonstrated the good performance of radiomics in predicting IAC with pGGNs and pGGNs invading the pleura, and the AUCs were 0.72 and 0.862, respectively, in the validation group [21,22].The radiomic nomogram is helpful for evaluating the invasiveness of pGGNs.It also provides valuable reference information for follow-up and elective surgical treatment.However, radiomic features are derived from manual segmentation, and 3D tumor segmentation is a complex and time-consuming process.At the same time, small blood vessels and bronchi must be avoided during the procedure, but the remaining blood vessels can affect the accuracy of some features.Therefore, both radiological and radiomic models have limitations such as subjective judgment, manual segmentation, and time and labor consumption, which affect the prediction results.
To date, some studies have revealed that artificial intelligence, including convolutional neural networks, is useful for the detection, characterization, and identification of GGNs on CT [23,24].Zhao et al. [25] reported that the 3D DenseSharp Network automatically predicted the invasiveness of subcentimeter GGNs (three-category classification) with an ACC of 0.641 and achieved better classification performance than radiologists.Gong et al. [26].used the deep residual learning network to predict the invasiveness of GGNs (two-category classification: pre-invasive lesions and IAC), and the AUC was 0.92 ± 0.03, reflecting an improved prediction performance for IAC.Lv et al. [27] reported that 3D conventional networks automatically predicted the invasiveness of GGNs (two-category classification), and the AUC was 0.862, similar to that for intraoperative frozen-section analysis.At present, research has used simple 3D deeplearning technology to explore the invasiveness of GGNs.
To the best of our knowledge, no prior study applied a deep learning technique to assess the degree of infiltration of pGGNs.The deep learning network was adopted in the study, which can automatically identify, monitor, qualitative and prognostic judgment of GGNs on CT, with high repeatability, good stability and short time, and can effectively overcome the limitations of subjective judgment, artificial segmentation, time and energy consumption.Therefore, based on EfficientNet-b0 and dualheaded ResNet networks and combinations of different clinical features, four different models were constructed to evaluate the invasiveness of pGGNs and provide a basis for accurate diagnosis and treatment.

Patient data
In total, 4073 patients with surgically confirmed lung tumors were identified from July 2013 to July 2022.The inclusion criteria were as follows: (1) pathological confirmation of the lesion as IAC, MIA, AIS, or AAH; (2) chest CT revealed pGGNs without solid parts; (3) use of thin-slice chest CT (thickness ≤ 1.5 mm); and (4) no treatment before surgery.The exclusion criteria included a duration of more than 1 month between CT and surgery, incomplete clinical data, and poor CT image quality.Finally, 1136 patients were included in the study.If multiple pGGNs were surgically treated in one patient, one nodule was selected as the study object with infiltration as the main factor and size as the auxiliary factor.Details of the inclusion and exclusion criteria are presented in Fig. 1.

Image processing and model construction
In this study, the EfficientNet-b0 and dual-head ResNet networks were used.The entire structure of the 3D model dual-head Res2Net network is presented in Fig. 2. The specific image processing and model construction are presented in the Supplementary file 1.

Clinical data and imaging analysis
Clinical data, including sex, age, smoking status, operation status, underlying diseases, tumor markers, and pulmonary diseases, were obtained from the patients' medical records.CT images were evaluated by two senior physicians with experience in chest diseases (with 12 and 18 respective years of experience), who agreed on the results.CT data were analyzed for the following variables: distribution, lobulation, spiculation, cystic airspace, air bronchogram type, pleural contact type, long diameter (LD), short diameter (SD), maximum CT attenuation (CTmax), minimum CT attenuation (CTmin), mean CT attenuation (CTmean), and standard deviation of CT attenuation (CTsd).In addition, LD and SD were measured for each lesion on the axial image.The CT values were the mean values of multiple measurements taken by two physicians.Pleural contact was divided into three types: direct contact (Type I), pleural tag (Type II), and Type I + Type II (Type III).Four models including the 2D model, 3D model, 3D model combining three clinical features (sex, age, smoking status; 3D_3F model), and 3D model combining 19 clinical features (sex, age, smoking status, operation status, underlying diseases, tumor markers, pulmonary diseases, distribution, lobulation, spiculation, cystic airspace, air bronchogram type, pleural contact type, LD, SD, CTmax, CTmin, CTmean and CTsd).

Statistical analysis
SPSS 25.0 software was used for statistical analysis.Qualitative data were presented as the frequency (component ratio), and the χ 2 test was used for intergroup comparisons.Pearson's correlation analysis was selected when the minimum expected value was greater than 5, and Spearman's correlation analysis was selected when the minimum expected value was less than or equal to 5 and greater than 1.The Shapiro-Wilk test was used to determine whether the quantitative data were normally distributed.Binary quantitative data with a normal distribution were presented as the mean ± standard deviation, and comparisons between the groups were performed using an independent-samples t-test.Tripartite quantitative data with a normal distribution were presented as the mean median (interquartile range), and comparisons between the groups was performed by ANOVA.The quantitative data of two categories that did not conform to a normal distribution were presented as the median (interquartile range), and the Mann-Whitney U test was used for intergroup comparisons.The quantitative data of three categories that did not conform to a normal distribution were presented as the median (interquartile range), and intergroup comparisons were performed using the Kruskal-Wallis H test (Welch test).The diagnostic performance of artificial intelligence models was evaluatedby the area under the receiver operating characteristic (ROC) curve (AUC), Accuracy(ACC) and other evaluation metrics, such as Recall, Precision, F1-Score.Python software was used to construct the 2D and 3D models and evaluate model efficiency.

General clinical data of the included patients
The clinical data of the 1136 patients are presented in Table 1.The numbers of patients in the training set with AAH + AIS, MIA, and IA were 155, 248, and 392, respectively, whereas the numbers in the validation set were 80, 84, and 177, respectively.Concerning the general clinical data, the mean patient age significantly differed among the groups in both the training and validation sets (both P < 0.05).The incidence of IAC increased with increasing age.In addition, the sex ratio, prevalence of underlying diseases, and tumor marker levels significantly differed among the groups in the training set (all P < 0.05).The distribution of each index was balanced in the training and validation sets.

CT findings of the study cohort
The CT findings of the 1136 patients are presented in Table 2. Lobulation, spiculation, air bronchogram type, LD, SD, CTmax, CTmin, CTmean, and CTsd significantly differed among the groups in the training and validation sets (all P < 0.05) (Fig. 3).The lesion distribution, cystic airspace, and pleural contact type significantly differed among the groups in the training set (all P < 0.05).The distribution of each index was balanced in the training and validation sets.

Comparison of the efficiency of the four models
Figure 4 presents the ROC curves for the four models in the training set.The AUCs for detecting the invasiveness of pGGNs as a three-category classification were 0.7827, 0.8116, 0.8207, and 0.8333 for the Efficient-Net-b0 2D, dual-head ResNet_3D, 3D_3F, and 3D_19F models, respectively.Furthermore, the ResNet_3D_3F model has slightly higher diagnostic efficiency than the ResNet_3D_4F and ResNet_3D_6F models (Supplementary Fig. S7 and S8).As presented in Table 3, statistical differences are detected in diagnostic efficiency between the 2D and 3D models using the Obuchowski index (P < 0.05).
Figure 5 presents the ROC curves of the four models in the validation set.The AUCs for detecting the invasiveness of pGGNs as a three-category classification are 0.80080, 0.80900, 0.81650, and 0.8158 for the EfficientNet-b0 2D, dual-head ResNet_3D, 3D_3F, and 3D_19F models, respectively.Among the four models, the dual-head ResNet_3D_3F model has the highest diagnostic efficiency.Table 4 illustrates that the diagnostic efficiencies of the four models do not differ as tested by the Obuchowski index, but the diagnostic efficiency is higher for the 3D models than for the 2D model.
The confusion matrix for each model is presented in Fig. 6.The figure presents the interval diagram of the distribution of the number of three-category classifications predicted by each model.Table 5 shows the diagnostic efficiencies of the four models in the training set and the validation set.Among the four models, the prediction efficiencies in descending order are Dual-head ResNet_3D_3F, Dual-head ResNet _3D _19F, Dual-head ResNet _3D and Dual-head ResNet_2D, respectively.

Discussion
To the best of our knowledge, no previous studies used a deep learning technique to specifically evaluate the invasiveness of pGGNs.Therefore, this is the only study to evaluate the invasiveness of pGGNs (three-category classification) using the ResNet network and combining different clinical features.We found that dual-head ResNet_3D_3F model had the highest diagnostic efficiency, whereas there were no significant differences in the diagnostic efficiency compared to other models.In clinical practice, the dual-head ResNet_3D_3F model can be used to evaluate the invasiveness of pGGNs.The results of this study are helpful to intelligently evaluate the invasiveness of pure ground-glass nodules, and provide methodological reference for future related studies.
This study illustrated that the incidence of IAC increases with age, consistent with previous findings [28,29].In this group, a mean age exceeding 59 years was useful for categorizing the invasiveness of pGGNs.Some studies revealed that the smoking status is significantly associated with the invasiveness of GGNs [30], contradicting the current findings.It has been reported in the literature that the smoking status is closely related to the growth of lung cancer manifested as GGNs and linked to increased invasiveness [30,31].Smokers could be more likely to develop lung adenocarcinoma with subsolid GGNs.In this group, the study population was mainly  women, and the overall proportion of smokers was low.These findings could explain the non-significant association of the smoking status with pGGN invasiveness.Most of the previous literature illustrated that lobulation and spiculation had significance in evaluating the invasiveness of pGGNs [7,16,32] and subsolid GGNs [12,33], in line with the current study findings.However, the utility of the air bronchogram sign in differentiating the invasiveness of GGNs has been inconsistent [29,[33][34][35].In this group, all patients with air the bronchogram sign had MIA or IAC, and the air bronchogram sign with deformation was more common in IAC.
Therefore, we concluded that the air bronchogram sign with deformation was a strong indicator of IAC.Lesion size and density were closely correlated with the invasiveness of GGNs [15,16,34,36].Hsu et al. [14] found that LD, SD, and the average diameter had significance in identifying the invasiveness of pGGNs, similar to our findings.CT data are important indicators of the invasiveness of pGGNs [16,37,38], in line with the present data.CTsd reflects the difference in tumor internal density, and greater values indicate greater tumor heterogeneity, which has not been reported in other studies.
With BD 0 The detection of microscopic changes and hierarchical   3 CT images showed three pure ground glass nodules (arrows), pathologically representing AIS, MIA and IAC, respectively.Among them, cystic airspace was seen on the MIA nodule, and lobulation on the IAC nodule features invisible to the human eye is one of the advantages of deep learning, and this helps to better distinguish the subtypes of lung cancer [25,26,39,40].The present study used 2D and 3D models in addition to models combining different clinical features (3 F, 4 F, 6 F, 19 F), differing from the simple 3D deep learning network models reported in other studies [25][26][27].In this cohort, the ACCs of simple 2D and 3D models for predicting the invasiveness of pGGNs (three-category classification) were 0.6422 and 0.6158, respectively, consistent with the results reported by Zhao [25] and Yu [41].The AUC and ACC for pGGN invasiveness (three-category classification) evaluated by the dual-head ResNet_3D_3F model  Fig. 6 The confusion matrix for the four models were 0.8165 and 0.651, respectively, exceeding the values obtained using other models.We found that precision was more important in the evaluation than the number of clinical features combined.In clinical practice, sex, age, and the smoking status are easy to obtain and simple to implement, highlighting their good prospects for use in future clinical practice.In the dual-head ResNet_3D_3F model, the positive prediction efficiency of lung precursor lesions was the highest (76.9%), followed by IAC (73.7%) and MIA (42.2%).Therefore, patients with lung precursor lesions benefited the most from this model, which can effectively alleviate the anxiety of these patients; Secondly, the benefit was IAC, which required shorter follow-up time or elective surgery.
The ResNet network was used in this study, and it adopted a multi-channel residual connection in the residual block in place of a single-channel residual connection [42].In this manner, the ResNet network can obtain fine-grained multi-scale features and increase the receptive field of the network.This study differed from prior research by improving multi-scale capability through the use of different resolutions [43,44].At the same time, the ResNet module can be inserted into other classic backbone networks such as ResNet [45], ResNeXt [46], and DLA [47] to achieve better results.The dual-head ResNet model adopted in this study first extracts the feature vectors of medical images using the ResNet network and uses the fully connected network to classify the feature vectors, and finally, the classification results obtained by the dual-head model are weighted and combined to obtain the final classification results to obtain better results.Meanwhile, the simple 2D model uses the Effi-cientNet-B0 network model, the core structure of which is the mobile inverted bottleneck convolution module [48].
Our study had multiple limitations.First, the ResNet network used in this study is relatively simple.It was not compared with other networks, such as ResNet, UNet, and DenseNet, nor was it integrated with other networks.Further research is necessary.Second, external validation was not introduced, which may affect model generalization.Next, we will actively collaborate with more hospitals to conduct external validation to determine the generalization of the model.Finally, the weak explanation of deep learning techniques is also a limitation of this study.The internal mechanisms of deep learning systems should be explored in future studies.

Conclusion
Four different models evaluating pGGNs invasiveness (three-category classification) have good diagnostic efficacy, in which dual-head ResNet_3D_3F model had the highest diagnostic efficiency.In clinical practice, the dual-head ResNet_3D_3F model can be applied to evaluate pGGN invasiveness, thereby providing a valuable tool for accurate diagnosis in patients.

Fig. 2 Fig. 1
Fig. 2 The entire structure of the 3D model dual-head ResNet network

1 ) 1 *
Note-Unless otherwise indicated, the data are qualitative variables, the number of patients are outside the brackets, the percentages are inside the brackets, and the statistical values are Pearson's χ 2 test ( †), Spearman's χ 2 test ( ‡ ) and Fisher's exact test ( ♀ );AAH = atypical adenomatous hyperplasia, AIS = adenocarcinoma in situ, MIA = minimally invasive adenocarcinoma, IA = invasive adenocarcinoma, HE = hypertension, HL = hyperlipidemia, HC = hyperglycemia, VPI = Ventilation perfusion imbalance † Pearson was used for chi-square test results with minimum expected value greater than 5 ‡ Spearman was selected for the minimum expected value greater than 1 and less than 5 in chi-square test results ♀ Fisher's was used for chi-square test results with minimum expected value greater than Data do not conform to normal distribution, the median are outside parentheses, while the lower quartile and the upper quartile are in parentheses, and the statistical values of two or three group are obtained from Mann-Whitney U test or Kruskal-wallis H test, respectively

Fig. 5
Fig. 5 ROC curves of the four models in the validation set Fig. 4 ROC curves for the four models in the training set

Table 1
The clinical characteristics of 1136 patients in both training and validation sets

Table 2
The CT features of 1136 patients in both training and validation sets Note BD = bronchial deformation, LD = long diameter, SD = short diameter, CTmax = maximum of CT attenuation value, CTmin = minimum of CT attenuation value, CTmean = mean of CT attenuation value, CTsd = standard deviation of CT attenuation, HU = hounsfield unit.The adoption of statistical values is consistent with Table1

Table 3
The Obuchowski index results of the four models in training set

Table 4
The Obuchowski index results of the four models in validation set

Table 5
Diagnostic efficiencies of four models in training set and validation set Note AUC = Area under the ROC curve, ACC = Accuracy, R = Recall, P = Precision, F1 = F1-Score